Adaptive Stream Fusion in Multistream Recognition of Speech
نویسندگان
چکیده
A new method to deal with variable distortions of speech during the operation of the system is proposed. First, multiple processing streams are formed by extracting different spectral and temporal modulation components from the speech signal. Information in each stream is used to estimate posterior probabilities of phonemes. Initial values for a weighted integration of these individual estimates are found by normalized cross-correlation of the estimates with the actual phoneme labels on the training data. A statistical model of the final estimated posterior probabilities is used to characterize the system performance. During the operation, the weights in the linear fusion are adapted using particle filtering to optimize the performance. Results on phoneme recognition from noisy speech indicate the effectiveness of the proposed method.
منابع مشابه
Toward optimizing stream fusion in multistream recognition of speech.
A multistream phoneme recognition framework is proposed based on forming streams from different spectrotemporal modulations of speech. Phoneme posterior probabilities were estimated from each stream separately and combined at the output level. A statistical model of the final estimated posterior probabilities is used to characterize the system performance. During the operation, the best fusion ...
متن کاملAudio-Visual Speech Modeling for Continuous Speech Recognition
This paper describes a speech recognition system that uses both acoustic and visual speech information to improve the recognition performance in noisy environments. The system consists of three components: 1) a visual module; 2) an acoustic module; and 3) a sensor fusion module. The visual module locates and tracks the lip movements of a given speaker and extracts relevant speech features. This...
متن کاملA Framework for Practical Multistream ASR
Robustness of automatic speech recognition (ASR) to acoustic mismatches can be improved by using multistream architecture. Past multistream approaches involve training large number of neural networks, one for each possible stream combination. During testing phase, each utterance is forward passed through all the neural networks to estimate best stream combination. In this work, we propose a new...
متن کاملDiscriminatively trained features using fMPE for multi-stream audio-visual speech recognition
fMPE is a recently introduced discriminative training technique that uses the Minimum Phone Error (MPE) discriminative criterion to train a feature-level transformation. In this paper we investigate fMPE trained audio/visual features for multistream HMM-based audio-visual speech recognition. A flexible, layer-based implementation of fMPE allows us to combine the the visual information with the ...
متن کاملA multistream multiresolution framework for phoneme recognition
Spectrotemporal representation of speech has already shown promising results in speech processing technologies, however, many inherent issues of such representation, such as high dimensionality have limited their use in speech and speaker recognition. Multistream framework fits very well to such representation where different regions can be separately mapped into posterior probabilities of clas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011